Goto

Collaborating Authors

 report card


Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries

Yang, Blair, Cui, Fuyang, Paster, Keiran, Ba, Jimmy, Vaezipoor, Pashootan, Pitis, Silviu, Zhang, Michael R.

arXiv.org Artificial Intelligence

The rapid development and dynamic nature of large language models (LLMs) make it difficult for conventional quantitative benchmarks to accurately assess their capabilities. We propose report cards, which are human-interpretable, natural language summaries of model behavior for specific skills or topics. We develop a framework to evaluate report cards based on three criteria: specificity (ability to distinguish between models), faithfulness (accurate representation of model capabilities), and interpretability (clarity and relevance to humans). We also propose an iterative algorithm for generating report cards without human supervision and explore its efficacy by ablating various design choices. Through experimentation with popular LLMs, we demonstrate that report cards provide insights beyond traditional benchmarks and can help address the need for a more interpretable and holistic evaluation of LLMs.


The Report Card on Guaranteed Income Is Still Incomplete

NYT > Economy

Silicon Valley billionaires and anti-poverty activists don't have a lot in common, but in recent years they've joined forces around a shared enthusiasm: programs that guarantee a basic income. Tech entrepreneurs like Sam Altman, chief executive of OpenAI, have promoted direct cash transfers to low-income Americans as a way to cushion them from what the entrepreneurs anticipate could be widespread job losses caused by artificial intelligence. Some local politicians and community leaders, concerned about growing wealth inequality, have also put their faith in these stipends, known as unconditional cash or, in their most ambitious form, a universal basic income. Dozens of small pilot projects testing unconditional cash transfers have popped up in communities around the country, from Alaska to Stockton, Calif. Andrew Yang, an entrepreneur, put the idea of 1,000 monthly payments for all adults at the center of his 2020 presidential campaign.


How Salesforce is using AI to create a more hyper-informed, more adoptable design system Inside Design Blog

#artificialintelligence

What if you were floating on an ocean of design data--millions of small decisions made across industries, cultures, and countries? How would you begin to fish out the innumerable, invaluable insights and patterns hiding there? Salesforce is on course to find out. It begins with Salesforce's Lightning platform. Customers can use it to drag, drop, and tweak components until they bring an entire working application into existence--no coding skills required.


AI scores candidates' facial movements, words and voice to determine how qualified they are

Daily Mail - Science & tech

Your resume may not be the only deciding factor in landing your next job – it could be an'employability score' created by artificial intelligence that has the final vote. More than 100 big name firms are using HireVue's AI-driven assessment, which is technology that ranks candidates based on their facial movements, choice of words and speaking voice. Although employers can pursue any candidate, some have told The Washington Post that they usually focus on those the computer system liked best -- leading some experts to question how bias the process may be. More than 100 employers are using HireVue's AI-driven assessment that ranks candidates based on their facial movements, choice of words and speaking voice HireVue's technology is employed by many large name companies such as Hilton Hotels, Unilever and Goldman Sachs, according to The Washington Post. And with hundreds of applications flooding in for just a single position, the AI has made it easy for human employers to find the perfect candidate what i- but some experts believe the technology can do more harm than good.


Silicon Valley's 2017 Report Card

MIT Technology Review

When future historians of Silicon Valley look back at 2017, they'll see a time when America's most powerful tech companies and the venture capital ecosystem that created them came under unprecedented scrutiny from politicians and the public. The region's innovation engine produced numerous technical advances, but controversy over fake news and revelations about sexual harassment of female entrepreneurs have cast a shadow over the Valley this year. Big tech companies in the San Francisco Bay Area were busier than ever in 2017, and artificial intelligence was a top priority for many of them. Among a long list of AI initiatives, Google launched TensorFlow Lite, a lightweight version of its open-source machine-learning software that has accelerated AI adoption among companies. The new version enables AI to run on mobile phones and household gadgets such as fridges and speakers.


Understanding Behavioral Economics to Change Behaviors with Big Data

@machinelearnbot

My good friend Vinnie participates in an automobile insurance program that rewards him for good driving behaviors; the better driving behaviors he exhibits, the more money he saves on insurance. You stick a device into the vehicle's diagnostic port (usually under the steering wheel in most vehicles manufactured after 1996), and the automobile insurance company tracks your driving behaviors and offers you automobile insurance discounts based upon the quality of your driving behaviors. The program actually "grades" driving behaviors including acceleration, turning, speed and braking, and once a month sends a report card on the past month's driving performance (see Report Card in Figure 1). And for my friend Vinnie, as a result of sharing his detailed driving data, he saved $1.49 over the past 6 months. Vinnie is saving $2.98 per year by sharing his detailed driving data with his auto insurance company.